Search CORE

37 research outputs found

Recommended from our members

R.ROSETTA: an interpretable machine learning framework.

Author: Baltzer Nicholas
Borneloev Susanne
Diamanti Klev
Feuk Lars
Garbulowski Mateusz
Komorowski Jan
Smolińska Karolina
Stoll Patricia
Øhrn Aleksander
Publication venue: 'Organisation for Economic Co-Operation and Development (OECD)'
Publication date: 07/03/2021
Field of study

Funder: Uppsala Universitet; doi: http://dx.doi.org/10.13039/501100007051Funder: Polska Akademia Nauk; doi: http://dx.doi.org/10.13039/501100004382Funder: Uppsala UniversityBACKGROUND: Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components. RESULTS: We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case-control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes. CONCLUSIONS: R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables

Apollo (Cambridge)

R.ROSETTA: an interpretable machine learning framework.

Author: Baltzer Nicholas
Bornelöv Susanne
Diamanti Klev
Feuk Lars
Garbulowski Mateusz
Komorowski Jan
Smolińska Karolina
Stoll Patricia
Øhrn Aleksander
Publication venue: BMC Bioinformatics
Publication date: 01/01/2021
Field of study

Repository for Publications and Research Data

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Apollo (Cambridge)

NORA - Norwegian Open Research Archives

Cancer LncRNA Census reveals evidence for deep functional conservation of long noncoding RNAs in tumorigenesis.

Author: Abascal Federico
Amin Samirkumar B.
Bader Gary D.
Barenboim Jonathan
Beroukhim Rameen
Bertl Johanna
Boroevich Keith A.
Brunak Soren
Campbell Peter J.
Carlevaro-Fita Joana
Carlevaro-Fita Joana
Chakravarty Dimple
Chan Calvin Wing Yiu
Chen Ken
Choi Jung Kyoon
Deu-Pons Jordi
Dhingra Priyanka
Diamanti Klev
Feuerbach Lars
Feuerbach Lars
Fink J. Lynn
Fonseca Nuno A.
Frigola Joan
Gambacorti-Passerini Carlo
Garsed Dale W.
Gerstein Mark
Getz Gad
Gonzalez-Perez Abel
Guo Qianyun
Gut Ivo G.
Haan David
Hamilton Mark P.
Haradhvala Nicholas J.
Harmanci Arif O.
Helmy Mohamed
Herrmann Carl
Hess Julian M.
Hobolth Asger
Hodzic Ermin
Hong Chen
Hong Chen
Hornshoj Henrik
Isaev Keren
Izarzugaza Jose M. G.
Johnson Rory
Johnson Todd A.
Juul Malene
Juul Randi Istrup
Kahles Andre
Kahraman Abdullah
Kellis Manolis
Khurana Ekta
Kim Jaegil
Kim Jong K.
Kim Youngwook
Komorowski Jan
Korbel Jan O.
Kumar Sushant
Lanzos Andres
Lanzos Andres
Larsson Erik
Lawrence Michael S.
Lee Donghoon
Lehmann Kjong-Van
Li Shantao
Li Xiaotong
Lin Ziao
Liu Eric Minwei
Lochovsky Lucas
Lou Shaoke
Madsen Tobias
Marchal Kathleen
Martincorena Inigo
Martinez-Fundichely Alexander
Maruvka Yosef E.
Mas-Ponte David
McGillivray Patrick D.
Meyerson William
Muinos Ferran
Mularoni Loris
Nakagawa Hidewaki
Nielsen Morten Muhlig
Paczkowska Marta
Park Keunchil
Park Kiejung
Pedersen Jakob Skou
Pedersen Jakob Skou
Pich Oriol
Pons Tirso
Pulido-Tamayo Sergio
Raphael Benjamin J.
Reimand Juri
Reyes-Salazar Iker
Reyna Matthew A.
Rheinbay Esther
Rubin Mark A.
Rubio-Perez Carlota
Sabarinathan Radhakrishnan
Sahinalp S. Cenk
Saksena Gordon
Salichos Leonidas
Sander Chris
Schumacher Steven E.
Shackleton Mark
Shapira Ofer
Shen Ciyue
Shrestha Raunak
Shuai Shimin
Sidiropoulos Nikos
Sieverling Lina
Sinnott-Armstrong Nasa
Stein Lincoln D.
Stuart Joshua M.
Tamborero David
Tiao Grace
Tsunoda Tatsuhiko
Umer Husen M.
Uuskula-Reimand Liis
Valencia Alfonso
Vazquez Miguel
Verbeke Lieven P. C.
von Mering Christian
Wadelius Claes
Wadi Lina
Wang Jiayin
Warrell Jonathan
Waszak Sebastian M.
Weischenfeldt Joachim
Wheeler David A.
Wu Guanming
Yu Jun
Zhang Jing
Zhang Xuanping
Zhang Yan
Zhao Zhongming
Zou Lihua
Publication venue: Commun Biol
Publication date: 01/01/2020
Field of study

Long non-coding RNAs (lncRNAs) are a growing focus of cancer genomics studies, creating the need for a resource of lncRNAs with validated cancer roles. Furthermore, it remains debated whether mutated lncRNAs can drive tumorigenesis, and whether such functions could be conserved during evolution. Here, as part of the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium, we introduce the Cancer LncRNA Census (CLC), a compilation of 122 GENCODE lncRNAs with causal roles in cancer phenotypes. In contrast to existing databases, CLC requires strong functional or genetic evidence. CLC genes are enriched amongst driver genes predicted from somatic mutations, and display characteristic genomic features. Strikingly, CLC genes are enriched for driver mutations from unbiased, genome-wide transposon-mutagenesis screens in mice. We identified 10 tumour-causing mutations in orthologues of 8 lncRNAs, including LINC-PINT and NEAT1, but not MALAT1. Thus CLC represents a dataset of high-confidence cancer lncRNAs. Mutagenesis maps are a novel means for identifying deeply-conserved roles of lncRNAs in tumorigenesis

Repository for Publications and Research Data

DSpace@MIT

Lund University Publications

Publikationer från Uppsala Universitet

Ghent University Academic Bibliography

Digitala Vetenskapliga Arkivet - Academic Archive On-line

UPF Digital Repository

Apollo (Cambridge)

Bern Open Repository and Information System (BORIS)

Analyses of non-coding somatic drivers in 2,658 cancer whole genomes.

Author: Abascal Federico
Akdemir Kadir C.
Alvarez Eva G.
Amin Samirkumar B.
Bader Gary D.
Baez-Ortega Adrian
Bandopadhayay Pratiti
Barenboim Jonathan
Beroukhim Rameen
Bertl Johanna
Boroevich Keith A.
Boutros Paul C.
Bowtell David D. L.
Brors Benedikt
Brunak Soren
Burns Kathleen H.
Busanovich John
Campbell Peter J.
Carlevaro-Fita Joana
Chakravarty Dimple
Chan Calvin Wing Yiu
Chan Kin
Chen Ken
Choi Jung Kyoon
CortesCiriano Isidro
Craft David
Deu-Pons Jordi
Dhingra Priyanka
Diamanti Klev
Dueso-Barroso Ana
Dunford Andrew J.
Edwards Paul A.
Estivill Xavier
Etemadmoghadam Dariush
Feuerbach Lars
Fink J. Lynn
Fonseca Nuno A.
Frenkel-Morgenstern Milana
Frigola Joan
Gambacorti-Passerini Carlo
Garsed Dale W.
Gerstein Mark
Getz Gad
Gonzalez-Perez Abel
Gordenin Dmitry A.
Guo Qianyun
Gut Ivo G.
Haan David
Haber James E.
Hamilton Mark P.
Haradhvala Nicholas J.
Harmanci Arif O.
Helmy Mohamed
Herrmann Carl
Hess Julian M.
Hobolth Asger
Hodzic Ermin
Hong Chen
Hornshoj Henrik
Hutter Barbara
Imielinski Marcin
Isaev Keren
Izarzugaza Jose M. G.
Johnson Rory
Johnson Todd A.
Jones David T. W.
Ju Young Seok
Juul Malene
Juul Randi Istrup
Kahles Andre
Kahraman Abdullah
Kazanov Marat D.
Kellis Manolis
Khurana Ekta
Kim Jaegil
Kim Jong K.
Kim Youngwook
Klimczak Leszek J.
Koh Youngil
Komorowski Jan
Korbel Jan O.
Kumar Kiran
Kumar Sushant
Lanzos Andres
Larsson Erik
Lawrence Michael S.
Lee Donghoon
Lee Eunjung Alice
Lee Jake June-Koo
Lehmann Kjong-Van
Li Shantao
Li Xiaotong
Li Yilong
Lin Ziao
Liu Eric Minwei
Lochovsky Lucas
Lopez-Bigas Nuria
Lou Shaoke
Lynch Andy G.
Macintyre Geoff
Madsen Tobias
Marchal Kathleen
Markowetz Florian
Martincorena Inigo
Martinez-Fundichely Alexander
Maruvka Yosef E.
McGillivray Patrick D.
Meyerson Matthew
Meyerson William
Miyano Satoru
Muinos Ferran
Mularoni Loris
Nakagawa Hidewaki
Navarro Fabio C. P.
Nielsen Morten Muhlig
Ossowski Stephan
Paczkowska Marta
Park Keunchil
Park Kiejung
Park Peter J.
Pearson John, V
Pedersen Jakob Skou
Pich Oriol
Pons Tirso
Puiggros Montserrat
Pulido-Tamayo Sergio
Raphael Benjamin J.
Reimand Juri
Reyes-Salazar Iker
Reyna Matthew A.
Rheinbay Esther
Rippe Karsten
Roberts Nicola D.
Roberts Steven A.
RodriguezMartin Bernardo
Rubin Mark A.
Rubio-Perez Carlota
Sabarinathan Radhakrishnan
Sahinalp S. Cenk
Saksena Gordon
Salichos Leonidas
Sander Chris
Schumacher Steven E.
Scully Ralph
Shackleton Mark
Shapira Ofer
Shen Ciyue
Shrestha Raunak
Shuai Shimin
Sidiropoulos Nikos
Sieverling Lina
Sinnott-Armstrong Nasa
Stein Lincoln D.
Stewart Chip
Stuart Joshua M.
Tamborero David
Tiao Grace
Torrents David
Tsunoda Tatsuhiko
Tubio Jose M. C.
Umer Husen Muhammad
Uuskula-Reimand Liis
Valencia Alfonso
Vazquez Miguel
Verbeke Lieven P. C.
Villasante Izar
von Mering Christian
Waddell Nicola
Wadelius Claes
Wadi Lina
Wala Jeremiah A.
Wang Jiayin
Warrell Jonathan
Waszak Sebastian M.
Weischenfeldt Joachim
Wheeler David A.
Wu Guanming
Yang Lixing
Yao Xiaotong
Yoon Sung-Soo
Yu Jun
Zamora Jorge
Zhang Cheng-Zhong
Zhang Jing
Zhang Xuanping
Zhang Yan
Zhao Zhongming
Zou Lihua
Publication venue: Nature
Publication date: 01/01/2020
Field of study

The discovery of drivers of cancer has traditionally focused on protein-coding genes1-4. Here we present analyses of driver point mutations and structural variants in non-coding regions across 2,658 genomes from the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium5 of the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA). For point mutations, we developed a statistically rigorous strategy for combining significance levels from multiple methods of driver discovery that overcomes the limitations of individual methods. For structural variants, we present two methods of driver discovery, and identify regions that are significantly affected by recurrent breakpoints and recurrent somatic juxtapositions. Our analyses confirm previously reported drivers6,7, raise doubts about others and identify novel candidates, including point mutations in the 5' region of TP53, in the 3' untranslated regions of NFKBIZ and TOB1, focal deletions in BRD4 and rearrangements in the loci of AKR1C genes. We show that although point mutations and structural variants that drive cancer are less frequent in non-coding genes and regulatory sequences than in protein-coding genes, additional examples of these drivers will be found as more cancer genomes become available

Publikationsserver der Universität Tübingen

Digitala Vetenskapliga Arkivet - Academic Archive On-line

UPF Digital Repository

Repository for Publications and Research Data

DSpace@MIT

Lund University Publications

Ghent University Academic Bibliography

Publikationer från Uppsala Universitet

UCL Discovery

Copenhagen University Research Information System

eScholarship - University of California

Apollo (Cambridge)

Bern Open Repository and Information System (BORIS)

University of St. Andrews - Pure

St Andrews Research Repository

Integrating multi-omics for type 2 diabetes : Data science and big data towards personalized medicine

Author: Diamanti Klev
Publication venue: 'Uppsala University'
Publication date: 01/01/2019
Field of study

Type 2 diabetes (T2D) is a complex metabolic disease characterized by multi-tissue insulin resistance and failure of the pancreatic β-cells to secrete sufficient amounts of insulin. Cells recruit transcription factors (TF) to specific genomic loci to regulate gene expression that consequently affects the protein and metabolite abundancies. Here we investigated the interplay of transcriptional and translational regulation, and its impact on metabolome and phenome for several insulin-resistant tissues from T2D donors. We implemented computational tools and multi-omics integrative approaches that can facilitate the selection of candidate combinatorial markers for T2D. We developed a data-driven approach to identify putative regulatory regions and TF-interaction complexes. The cell-specific sets of regulatory regions were enriched for disease-related single nucleotide polymorphisms (SNPs), highlighting the importance of such loci towards the genomic stability and the regulation of gene expression. We employed a similar principle in a second study where we integrated single nucleus ribonucleic acid sequencing (snRNA-seq) with bulk targeted chromosome-conformation-capture (HiCap) and mass spectrometry (MS) proteomics from liver. We identified a putatively polymorphic site that may contribute to variation in the pharmacogenetics of fluoropyrimidines toxicity for the DPYD gene. Additionally, we found a complex regulatory network between a group of 16 enhancers and the SLC2A2 gene that has been linked to increased risk for hepatocellular carcinoma (HCC). Moreover, three enhancers harbored motif-breaking mutations located in regulatory regions of a cohort of 314 HCC cases, and were candidate contributors to malignancy. In a cohort of 43 multi-organ donors we explored the alternating pattern of metabolites among visceral adipose tissue (VAT), pancreatic islets, skeletal muscle, liver and blood serum samples. A large fraction of lysophosphatidylcholines (LPC) decreased in muscle and serum of T2D donors, while a large number of carnitines increased in liver and blood of T2D donors, confirming that changes in metabolites occur in primary tissues, while their alterations in serum consist a secondary event. Next, we associated metabolite abundancies from 42 subjects to glucose uptake, fat content and volume of various organs measured by positron emission tomography/magnetic resonance imaging (PET/MRI). The fat content of the liver was positively associated with the amino acid tyrosine, and negatively associated with LPC(P-16:0). The insulin sensitivity of VAT and subcutaneous adipose tissue was positively associated with several LPCs, while the opposite applied to branch-chained amino acids. Finally, we presented the network visualization of a rule-based machine learning model that predicted non-diabetes and T2D in an “unseen” dataset with 78% accuracy

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Integrating multi-omics for type 2 diabetes : Data science and big data towards personalized medicine

Author: Diamanti Klev
Publication venue: 'Uppsala University'
Publication date: 01/01/2019
Field of study

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Supplementary tables:MetaFetcheR: An R package for complete mapping of small compound data

Author: Csombordi Rajmund
Diamanti Klev
Komorowski Jan
Yones Sara A.
Publication venue: Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland.; Washington National Primate Research Center, Seattle, WA, USA.
Publication date: 01/01/2021
Field of study

Small-compound databases contain a large amount of information for metabolites and metabolic pathways. However, the plethora of such databases and the redundancy of their information lead to major issues with analysis and standardization. Lack of preventive establishment of means of data access at the infant stages of a project might lead to mislabelled compounds, reduced statistical power and large delays in delivery of results. We developed MetaFetcheR, an open-source R package that links metabolite data from several small-compound databases, resolves inconsistencies and covers a variety of use-cases of data fetching. We showed that the performance of MetaFetcheR was superior to existing approaches and databases by benchmarking the performance of the algorithm in three independent case studies based on two published datasets

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

MetaFetcheR : An R Package for Complete Mapping of Small-Compound Data

Author: Csombordi Rajmund
Diamanti Klev
Komorowski Jan
Yones Sara A.
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

Small-compound databases contain a large amount of information for metabolites and metabolic pathways. However, the plethora of such databases and the redundancy of their information lead to major issues with analysis and standardization. A lack of preventive establishment of means of data access at the infant stages of a project might lead to mislabelled compounds, reduced statistical power, and large delays in delivery of results. We developed MetaFetcheR, an open-source R package that links metabolite data from several small-compound databases, resolves inconsistencies, and covers a variety of use-cases of data fetching. We showed that the performance of MetaFetcheR was superior to existing approaches and databases by benchmarking the performance of the algorithm in three independent case studies based on two published datasets.Title in thesis list of papers: MetaFetcheR: An R package for complete mapping of small compound data</p

Publikationer från Uppsala Universitet

Directory of Open Access Journals

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Multifaceted regulation of hepatic lipid metabolism by YY1

Author: Cavalli Marco
Diamanti Klev
Gutierrez Ariadna Lara
Komorowski Jan
Pan Gang
Wadelius Claes
Publication venue: 'Life Science Alliance, LLC'
Publication date: 01/01/2021
Field of study

Recent studies suggested that dysregulated YY1 plays a pivotal role in many liver diseases. To obtain a detailed view of genes and pathways regulated by YY1 in the liver, we carried out RNA sequencing in HepG2 cells after YY1 knockdown. A rigid set of 2,081 differentially expressed genes was identified by comparing the YY1-knockdown samples (n = 8) with the control samples (n = 14). YY1 knockdown significantly decreased the expression of several key transcription factors and their coactivators in lipid metabolism. This is illustrated by YY1 regulating PPARA expression through binding to its promoter and enhancer regions. Our study further suggest that down-regulation of the key transcription factors together with YY1 knockdown significantly decreased the cooperation between YY1 and these transcription factors at various regulatory regions, which are important in regulating the expression of genes in hepatic lipid metabolism. This was supported by the finding that the expression of SCD and ELOVL6, encoding key enzymes in lipogenesis, were regulated by the cooperation between YY1 and PPARA/RXRA complex over their promoters

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line